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How to Estimate Change from Samples 

Edith Cohen*+ Haim Kaplan + 



Abstract 

Measurements, snapshots of a system, traffic matrices, and activity logs are typically collected repeatedly. Differ- 
ence queries are then used to detect and localize changes for anomaly detection, monitoring, and planning. When the 
data is sampled, as is often done to meet resource constraints, queries are processed over the sampled data. We are 
not aware, however, of previously known estimators for L p (p-norm) distances which are accurate when only a small 
fraction of the data is sampled. 

We derive estimators for L p distances that are nonnegative and variance optimal in a Pareto sense, building on 
our recent work on estimating general functions. Our estimators are applicable both when samples are independent or 
coordinated. For coordinated samples we present two estimators that tradeoff variance according to similarity of the 
data. Moreover, one of the estimators has the property that for all data, has variance is close to the minimum possible 
for that data. 

We study performance of our Manhattan and Euclidean distance (p — 1,2) estimators on diverse datasets, demon- 
strating scalability and accuracy - we obtain accurate estimates even when a small fraction of the data is sampled. 
We also demonstrate the benefit of tailoring the estimator to characteristics of the dataset. 

1 Introduction 

Data is commonly generated or collected repeatedly, where each instance has the form of a value assignment to a set 
of keys: Daily summaries of the number of queries containing certain keywords, transmitted bytes for IP flow keys, 
performance parameters (delay, throughput, or loss) for IP source destination pairs, environmental measurements for 
sensor locations, and requests for resources. In these examples, each set of values (instance) corresponds to a particular 
time or location. The universe of possible key values is fixed across instances but the values of a key are different in 
different instances. 

Difference queries between instances facilitate anomaly detection, monitoring, and planning by detecting, measur- 
ing, and localizing change |6]|24). Figure Q] shows two instances and difference measures that include the Euclidean 
and Manhattan norms. 
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Difference between instances for keys H: 

L P p± (H) = T, heH RG p± (h) 
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Example Queries: 

Li([6]) = 20 
Li({4,5,6}) = 7 
L=({1,5}) = 32 
L 2 ([6]) = vTMss 11.6 
L 2 -([6]) = \/30« 5.5 
L 2 +([6]) = V104 ss 10.2 



Figure 1: Difference measures between two instances on subset H of the keys. Top table shows values Vi(h) for key 
h € [6] in instance i £ [2], Bottom table shows single-key diffs. 



Data collection and warehousing is subject to limitations on storage, throughput, and energy required for transmis- 
sion. Even when the data is stored in full, exact processing of queries may be slow and resource consuming. Random 
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sampling of datasets is widely used as a means to obtain a flexible summary over which we can query the data while 
meeting these limitations fl25l [33l HI [3] El [191 |20l [I] [l^ III [211 [TO1 [IS1 [8] [TT1 . 

Quality estimators are essential to scalable and accurate querying of sampled data. We seek estimators that are 
accurate when a small fraction of the data is sampled and are efficient to compute. Since differences are nonnegative, 
we are interested in nonnegative estimators. 

A sampling algorithm maps data values and a set of random bits to a set of sampled keys. We focus on weighted 
sampling, meaning that the probability a key is sampled depends on its value. When these values are skewed, sampling 
schemes which favors heavier keys allow for more accurate estimates. When values are 0/1, keys with value, which 
are often the majority of possible keys, have inclusion probability and hence need not be explicitly considered - In 
an IP router, only a small subset of all possible keys IP addresses or IP flow keys are active. 

As instances are dispersed in time or location, for scalability, the sample of one instance can not depend on values 
assumed in another lfT5l[T2l . but random bits can be public (when generated using random hash functions). The two 
extremes of the joint distribution of samples of different instances are independent sampling (independent sets of 
random bits for each instance) and coordinated sampling (identical sets of random bits). With coordinated sampling, 
similar instances have similar samples whereas independent samples of identical instances can be completely disjoint. 
These two sampling schemes have different strengths and therefore we consider both: While coordination |]2] |3T] |28] 
l30l |4] [3] H [ID |20] [U [TD] [2T| QT| Q3) allows for tighter estimates of many basic queries including distinct counts (set 
union), quantile sums, and as we shall see, difference norms, |7][lj3[20][TT|[T5][T2), it also has pitfalls: (i) it results 
in unbalanced "burden" where same keys tend to be sampled across instances - an issue when, for example, sampling 
is performed prior to transmission to save power and (ii) variance on some queries - notably linear combinations of 
single-instance queries - is larger than with independent sampling ("total traffic on Monday-Wednesday", from daily 
summaries) - an issue if sample is primarily used for such queries. 

Contribution: 

We derive unbiased nonnegative estimators for IP and the downward-only and upward-only components L p + and 
IP . The sampling of each instance can be Poisson or bottom- fc and samples of different instances can be independent 
or coordinated. 

We estimate L? as a sum over selected keys of nonnegative unbiased variance optimal RG p estimates of the values 
assumed by the key (see Figure Q~]for examples and definitions). Variance optimality is in a Pareto sense - another 
estimator with strictly lower variance on some data must have strictly higher variance on another data. L, p is estimated 
by the pth root of our IP estimate. The bias of the L p estimate decreases with the coefficient of variation of the 
(unbiased) IP estimator, which decreases when more keys are selected. Similarly, the downward-only and upward- 
only components are estimated as respective sums of RG P _ and RG p+ estimates. 

Over independently-sampled instances, we derive an optimal monotone estimator for RG P of two values (p > 0). 
Monotonicity means that the estimate value is non-decreasing with the information we can glean from the outcome. 
Our construction adapts a technique we developed in IfTZl . 

Over coordinated samples, we apply lfT4l[T3l to derive the L and U estimators which are unbiased, nonnegative, and 
variance-optimal. The L estimator has lower variance for data with small difference (range) whereas the U estimator 
performs better when the range is large. The L estimator is monotone and "variance competitive": on all data vectors, 
its variance is not too far off the minimum possible variance for the vector by a nonnegative unbiased estimator. 

For p = 1 , 2, we compute closed form expressions of estimators and their variance and also tight bounds on 
"competitiveness" of the L estimator. We evaluate and compare the performance of our L\ and L\ difference estimators 
on diverse data sets, which vary in size and magnitude of change, and relate observed performance to properties of the 
data. We also examine the behavior of the L and U estimators and provide guidelines to choosing between them based 
on properties of the data. 

Roadmap: Section[3]contains necessary background and definitions. We present difference estimators for independent 
samples in Section|4]and for coordinated samples in Section|5] Section|6]contains an experimental evaluation. 



2 Related work 

There was little prior work on estimating difference norms from samples. This is at least partly because, under 
common schemes such as when sampling via random accesses, there are strong lower bounds |]5] [12] on estimation 
quality, showing that most entries need to be sampled in order to obtain estimates with meaningful accuracy. 

Our estimators, which are accurate even when only a small fraction of the data is sampled, critically depend on 
reproducibility of the "random bits" used by the sampling algorithm. More precisely, the inclusion probability of a 
key depends both on its value and a "random seed." Knowledge of the seed (which can be hash based) provides the 
estimator with additional power, since when a key is not sampled we are able to bound its value. 

Fortunately, known seeds can be integrated with basic sampling schemes when data entries can be individually 
examined by the sampling algorithm, which is commonly the case when samples are produced as summaries of large 
data sets. 

One can attempt to obtain nonnegative and unbiased estimates via classic inverse probabilities (Horvitz Thomp- 
son 1231 ): When the outcome reveals the value of the estimated quantity, the estimate is equal to the value divided by 
the probability of such an outcome. The estimate is otherwise. Inverse probability estimates, however, are inappli- 
cable to difference estimation over weighted samples, since they require that there is positive probability for outcomes 
that reveal the exact value of the estimated quantity. With multiple instances and weighted sampling, keys that have 
zero value in one instance and positive value in another have positive contribution to the difference but because zero 
values are never sampled, there is zero probability for determining the value from the outcome. 

The only pre-existing satisfactory difference estimator we are aware of is for L\ over coordinated samples, which 
uses the relation \v\ — v% | = max{i;i , v 2 } — min{ui , Vi } to obtain an indirect estimate as the difference of two inverse 
probability estimates for the maximum and minimum |fT5l . Our U estimator for p = 1 is a strengthening of this L\ 
estimator. 

Lastly, difference estimation of streams was extensively studied using sketches of the streams (e.g. 1X71 ), which are 
synopses that are not sample-based. With sketches it is possible to obtain tighter estimates on the difference between 
complete streams but sketches have limited usefulness for other queries and do not preserve information on values of 
particular keys, and in particular, do not naturally support subset queries. 

3 Preliminaries 

We denote by Vi (h) £ V the value of key h € K in instance i G [r] and by the vector v(h), the values of key h in all 
instances. The exponentiated range function over values of a single key v(h) is: 

RG p (v) = (max(w) - min(t>)) p (p > 0) (1) 

where max('u) = max^ Vi and min('u) = min^ Vi are the maximum and minimum entry values of the vector v. 
We omit the subscript when p = 1. For two instances (r = 2) we use the following to isolate upward-only and 
downward-only changes: 

RG p+ (i>i,v 2 ) = max{t>i -v 2 ,0} p 
RG p -(vi,V2) = max{t>2 — «i,0} p 

For a selected subset H c K of keys, we define 

LP(H) = J^ RG p («(/i)) • (2) 

h&H 

The p-norm of the difference of two instances (r = 2) is L p (H) = (L I p '(H)) 1 / p — \\v\(H) — ^2(-ff)|| p . For upward- 
only and downward-only change we use L^ ± {H) = ^2 heH RG p ±(vi(h),V2{hj). 

When data is sampled, we estimate L? L^ + , and I) ° a _, by summing estimates for the respective single-key prim- 
itives (RG p , RG p+ and RG P _) over keys in H. We use unbiased estimators for the primitives, which result, from 
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Figure 2: Independent and coordinated samples of two instances. Seed selection and ratios. Poisson PPS samples of 
expected size 3 and priority samples of size 3. 



linearity of expectation, in unbiased estimates for the sums. Since only a fraction of keys is sampled, our estimates for 
each primitive generally have high CV (coefficient of variation, which is the ratio of the square root of the variance 
to the mean). Since estimates for different keys are (pairwise) independent (or nonpositively correlated), variance is 
(sub)-additive and the CV decreases when \H\ increases, allowing for accurate estimates when H is sufficiently large. 
Unbiasedness of the single-key estimators is essential here. Since unbiasedness is not preserved under exponentiations, 
we need to carefully tailor to the exponent value p. 

We estimate L p (H) by taking the pth root of the estimate for LP(H). This estimate is biased, but the error is small 
when the CV of our LP(H) estimate is small. 

Sampling scheme of instances. Our estimators apply to Poisson sampling, where keys are sampled independently, and 
bottom-fc (order) sampling, that yields a sample size of exactly k. Bottom-fc sampling includes Priority (Sequential 
Poisson) sampling and weighted sampling without replacement 1301 |29l |27l l9l [181 [101 [PD . We state these classic 
schemes in a way which allows the "random" bits to be reproducible, first by the sampling algorithm, to facilitate 
coordination, and also by the estimator, to yield strong estimates. 

We reuse notation from lfl"2l FBI H"3"l . Sampling is specified by a set of nondecreasing continuous functions r t - , 
defined on the interval [0, 1]. Each key h in instance i is associated with a random seed value Ui(h) ~^> U[0, 1] chosen 
uniformly at random. To make randomization reproducible, Ui(h) is generated via a random hash function (pairwise 
independence and fewer bits in the representation of the seed suffice, but we skip these details here). With Poisson 
sampling, 

h is sampled in instance i <=> Vi(h) > T^(ui(h)) . (3) 

A bottom-fc sample of instance i includes the fckeys with largest ratios ri(h) = Vi(h) / 'r/ 1 (ui(Kj) . Samples of different 
instances are independent when the seeds Ui(h) are independent for all i. They are coordinated (shared-seed) if the 
same seed is used for the same key in all instances, that is, V/i £ K,Vi G [r],Ui(h) = ui(h) = u(h). 

When threshold functions have the form t^{x) = ax (to simplify notation we use r/'(x) = T^x, treating r/ 1 as 



constant), the corresponding Poisson samples are PPS (Probability Proportional to Size) 11221 . Strictly, PPS sampling 
assumes that r/ 1 are consistent across the instance, that is, t\ 1 = n, but our analysis is general. 

Sampling can be performed efficiently both when the threshold r, is fixed or when set adaptively by a streaming 
algorithm to achieve a specified expected sample size £[15*1] = J2heK mm {l> Vi(h)/ri}. As an example, to obtain 
a PPS Poisson sample of expected size £[15*1] = 3 for the instances in Figure Q]we use t\ = 29/3 (instance 1) and 
t"2 = 33/3 =11 (instance 2). 

The bottom-fc sample obtained with t/'(.t) = x is a priority (sequential Poisson) sample J27l [T8l l32l . Weighted 
sampling without-replacement is obtained with thresholds r^{x) = — \nx. Figure [2] shows PPS and priority samples 
obtained with respect to random seeds for the two instances in Figure Q] 

Sampling model (single key): The exponentiated range estimators are applied to samples of the same key h across 
instances i £ r. That is, we work with the restrivtion of the sample to one key at a time. 

With Poisson sampling, for key h, we can obtain from the sample (O, the values of sampled entries of key h. The 
seed vector u = u{h) and the thresholds r = (ti(Ii), . . . , T r (h)) are all available to the estimator. With Bottom-fc 



sampling of instances, the threshold is not readily available, so we work with effective thresholds as follows. We 
condition the inclusion of h on seeds of other keys being fixed iflOl [T8l and define t/ 1 = r, to be the inverse of the 
fcth largest Ti(h) of keys in instance i with h excluded (which is the k + 1st largest ratio over all keys in the instance). 
From here onward, we omit from the notation the reference to the key h and focus on exponentiated range estimators. 
We return to sum aggregates only for the experiments in Section|6] 

The data (values of a single key h in instances i G [r]) is v = (vi,V2, ■ ■ ■ ,v r ) G V = V r (we mostly assume 
V G K> )- The outcome S depends on the data v, random seed vector u and threshold functions r. The zth entry is 
included in S if and only if its value is at least Ti (ui): 

i G S <=> v t > n(ui) . 

The set of all data vectors consistent with outcome S (we treat u G [0, l] r as included with S) is 

V*(S) = {veV \ S = S{u,v)} 

= {z | Mi G [r],i £ S A z t = Vi M i ^ S Az t < Ti{ui)} . 

We can equivalently define the outcome as the set V*(S) since it captures all information available to the estimator. 

Estimators: An estimator f for / : V is a numeric function applied to the outcome. To be well defined in continuous 
domains, / should be (Lebesgue) integrable. For exponentiated ranges, which are nonnegative quantities, we are 
interested in estimators that are nonnegative f(S) > for all S. As explained earlier, since we sum many estimates, 
we would like each estimate to be unbiased E[f\v] = f(v). Other properties we seek are bounded variance on 
all data, and variance-optimality (respectively, variance + -optimality): there is no (resp., nonnegative) estimator with 
same or lower variance on all data and strictly lower on some data. An intuitive property that is sometimes desirable 
is monotonicity: the estimate value is non decreasing with the information on the data that we can glean from the 
outcome V*{S) C V*(S') =S> f(S) > f(S'). 

Order-based variance optimality: Given a partial order -< on V, an estimator / is -<-optimal (respectively, ^ + - 
optimal) if it is unbiased (resp., nonnegative) and for all data v, minimizes variance for v conditioned on the variance 
being minimized for all preceding vectors. Formally, if there is no other unbiased (resp., nonnegative) estimator that 
has strictly lower variance on some data v and at most the variance of / on all vectors that precede v. Order-based 
optimality implies variance optimality. 

4 Independent PPS sampling 

The outcome S(u, v) is determined by the data v and a random seed vector u G [0, l] r with independent entry values. 
V*(S) = {z | Mi G [r], i G S A z t = v t V i & S A z, < nm) . 

We derive the L estimator, RGp , which is the unique symmetric, monotone, and variance + optimal estimator, by 
applying our framework from lfl"2l to construct the estimator for a function /. The application has two ingredients: 
The first is a method to construct a ^-optimal estimator f^> for with respect to a partial order -< on data vectors. The 
second ingredient is to identify a partial order -< so that the estimator f^> is nonnegative, and therefore, -< + -optimal. 

Review of the construction of f^\ The determining vector </>(S) of an outcome S is a -<-minimal vector in the 
closure of consistent vectors: <fi(S) = miru; cl(V* (S)). Accordingly, we can specify the sets </> _1 (v) of all outcomes 
determined by v and all outcomes Sq(v) that precede v, that is, consistent with v but determined by a vector that 
precedes v. 

r» = {s\ v = 4>(s)} 

S (v) = {S\veV*(S) A cf>(S) -< v} 

The estimator /M is the same for all outcomes with same determining vector, and therefore we can specify it in 
terms of the determining vector f^(S) = f^'((/)(S)). We now state constraints that must be satisfied by f^'. The 



contribution of the outcomes Sq(v) to the expectation E[/^|i;] is 

f (v) = E[f^\S (v),v}PR[S (v)\v} , 

where E[f^\So(v), v] is the expectation of f^> on outcomes that precede v and PR[5o(f)|f] is the probability that 
the outcome precedes v when the data is v. For all vectors v G V, we require (this is necessary for unbiasedness) that 



If PR^- 1 ^)^] = then f {v) = f(v). 



(4) 



Else,/ M (u) = 



/(«)-/o(v) 



PR^" 1 (v)\v] 

_1 (v) \v] is the probability that the outcome is determined by v when the data vector is v. 



(5) 



where pr[ 

Choice of -<. For RG p , we choose -< so that the relation between vectors is according to an increasing lexicographic 

[r]}. The -<- 

ig vector 4>(S) 

and if h & S 

2 is shown in 



order on the lists L(v ), which we define to be the sorted multiset of differences {i\ — min('u) | i <G 
minimum vectors are those with all entries having equal values. With our choice of -<, the determining 
is as follows: if S = (no entries are sampled), 4>(S) = 0. Otherwise, if h € S then <fi(S)h = Vh 
then <f>(S)h = min{min.,gg v%, min^s Uin}. The mapping of outcomes to determining vectors for r = 
Table[T](Right). 
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Table 1 : Left: Estimator RG„ for p > and r = 2 over independent samples, stated as a function of the determining 
vector (v\,V2) when i>i > t>2 (case V2 > vi is symmetric). Right: mapping of outcomes to determining vectors. 



Derivation of RG:, . To obtain the estimator, we solve I© for all v such that PR[0 1 (-u)|i;] > and verify that 



fo(v) = f(v) for all v such that PR[0 1 (i;)|u] = 0. The estimator RG„ (p > 0) when r = 2 is specified in 
Table Q] through a mapping of determining vectors to estimate values. We can verify that for all p > 0, the estimator 



,« 



,(£) 



RGj, is nonnegative, monotone (RG„ (v, x) is non-increasing for x £ (0, u]) and has finite variances (follow from 

Jq RGp (u, x) 2 dx < oo). Table|2]shows explicit expressions of RG and R~G 2 . 

Vectors with PR[^ _1 (S')] = are exactly those with one positive and one zero entry and we can verify that (0|i is 
satisfied, that is, that RG„ (p > 0) is unbiased on these vectors. We conjecture that a solution of© for r > 2 is also 
nonnegative, and monotone. 



V = («i, v 2 ) 



■ww 

V 1 > V-2 > T2 
Vl > V-2 < T2 



RG^V) 



in{Tl,«l} 



(''1 



l{0,f 1 -T 2 } 



T) = Ql,t>2) 



~WW 
Vl > V2 > T2 

Vl > V2 < T2 



RG^V) 



2 T1 r 2 



(«1 



— 



7^77 

c{0, 



^{vi, T2} + Ui In mln , x :7 " 2 



'1-T2}" 



Table 2: Explicit form of estimators RG and RGj for r — 2 over independent samples. Estimator is stated as a 
function of the determining vector (vi, 1)2) when Vi > V2 (case V2 > Ui is symmetric). 



We now provide details on the derivation. We consider vectors v in increasing ~< order and solve © for f(~Q on 
outcomes with determining vector v. 

We express RG„ (v, v — A) (p > 0, A £ [0, u]) as a function of t\ and T2. 



• Case: v — A > t 2 . Estimate can be positive only when u\T\ < v, which happens with probability min{l, v/t\}. 
We solve the equality A p = min{l, v/ri}RG p , obtaining 

RG { p L \v,v-A) = — n 



-A p 



• Case: v - A < t 2 . From © 

A p 
Taking a partial derivative with respect to A, we obtain 
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We use the boundary value for A = max{0, v — t 2 }: 
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The special case n = r 2 = r: The estimators RG and RG 2 as a function of the determining vector and their 



variance are provided in Tables |3]and |4] For data vectors where Vi > v 2 > r, RG = ui — «2 and VAR[rg ] = 0. 
If vi > T > V2, RG ( = rln^ +vi — t, and VAR[RG ] = -2rw 2 ln(^) - v 2 + (r) 2 . Finally, if v 2 < "i < t, 
RG (L) (v 1 ,v 2 ) = ^- Inland 
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Table 3: Estimator RG and its variance for independent samples. 

Similarly, for v 2 < vi < r, RG 2 (vi, v 2 ) = 2(r) 2 (ln — — " 1 ~ 1 ' 2 ). The variance when vi > r is the same as for 
shared-seed sampling (see next section). 



5 Shared-seed sampling 

The outcome S(u, v) is determined by the data v and a scalar seed value u G (0, 1], drawn uniformly at random: Entry 
i is included in S if and only if Vi > Ti(u), where Tj is a non-decreasing continuous function with range containing 

(min V, max V). 
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RG { ^ L> (V 1 ,V 2 ) 


V\ > V 2 > T 

vi > r > v 2 

V 2 < Vi < T 


(v± - v 2 y 

v l - ( T ) 2 ~ 2t(«i - i> 2 ) + 2tui In -^ r 
2(T) 2 (ln" 1 »l-«2) 

v ' v "2 "1 ' 



Data vi > V2 


var[r<#°|«] 


Vi > V2 > T 







Ui > r > 't?2 


— 4?;it;2T(2vi — t^) In ^7- 
—6viv 2 r — Av 1 v 2 










-v\ +4t)iw| + 4u 2 (t) 2 - 2i>i(t) 3 




V2 < 1>l < T 


-4(r) 2 (2i>! - 1-2)^2 ln(^i-) 


-t-2) 4 



Table 4: Estimator rGj an d its variance for independent samples. 



Our RG p (p > 0) estimators are derived by applying our general theory on shared-seed estimators lfl4l [T3l . We 

derive two unbiased nonnegative variance + -optimal estimators, the L estimator RG p and the U estimator RG p . 

~ M 
As a reference point, we also consider for each vector v, the v-optimal estimate values RG p . An estimator is v- 

optimal if amongst all estimators that are nonnegative and unbiased for all data, it has the minimum possible variance 

when the data is v. It turns out that the values assumed by a ■u-optimal estimator on outcomes consistent with v are 

unique, up to equivalence, and we refer to them as the v-optimal estimates. 

We compute closed form expressions of estimators and variances when r has all entries equal (to the scalar r). 

The expressions for the upward-only and downward-only variants follow those for RG p and are omitted. 

Structure of the set of outcomes. The set of data vectors consistent with outcome S(u, v) is 

V*(S) = {z\Vi e [r],i e S A z, = v t V i ^ S A z t < n(u)} . 

From the outcome S(u, v), we can determine not only V*(S(u, v)) but also V*(S(x, v)) for all x > u. Observe that 
the sets V*(S(u, z)) are the same for all consistent data vectors z £ V*(S(u, v)). Fixing the data v, the upper bounds 
Ti (u) on values of entries that are not sampled are non-decreasing functions of u and therefore, the set V* (S(u, v)) is 
non-decreasing with u and the set of sampled entries is non-increasing. This means that the information on the data 
that we can glean from the outcome increases when u decreases. 

The lower bound function. To proceed, we need to define the lower bound function RG : 

RGJS) = inf RGp(w) , 

which maps an outcome S to the infimum of RG p values on vectors that are consistent with the outcome. For RG, the 
lower bound is the difference between a lower bound on the maximum entry and an upper bound on the minimum 
entry. 

RG(S') = maxwj — minjminvj, minr,(p)} . 



The lower bound on RG p is the pth power of the respective bound on RG, that is, RG (S) = RG(5) P . For S(u, v), we 

use the notation RG (S) = RGp (u) which is convenient when we fix v while varying the seed u. 
For PPS sampling with all-entries-equal r: 



condition 



< v ) 



u > 

^- > U > y -^- 



U < 



i(v) 



\s\ 





1.. 



r-1 



RG(5) 





max(o) — ut 
RG{v) 



u-optimality. For a data vector v, we can determine when a nonnegative and unbiased estimator RGp has minimum 
variance on v. We use the notation H^l (u) for the lower boundary of the convex hull (lower hull) of RGp (u). This 
function is monotone non-increasing and therefore differentiable almost everywhere. 



andRG^u) =0. 
p < 1, the 
u < ma ^ . when max(w) < r, -^rq (w) = RG p (u)(l — u ma ^/ v \ ) an d when max(w) > r, -ffjjQ (u) = RG p (i>) 



Theorem 5.1. /74/ A nonnegative unbiased estimator RG p minimizes VAR[rg p | u] <=> iJi"J (u) = J RG p (v,x)dx 
•^=^> almost everywhere 

RG p V) = £—. (7) 

Note that the specification (0 of the w-optimal estimates on outcomes consistent with v is unique (in an almost 
everywhere sense). The estimates (O are monotone non-increasing in u. Observe that the specification for different 
vectors with overlapping sets of consistent outcomes can be inconsistent and thus, it is not possible to obtain a single 
universally optimal estimator that is w-optimal for all v. 

We can now specify RG„ for PPS sampling with all-entries-equal r. The function RG P (u) is max{0, max(i;) — 
t} for u > ma ^^ and equal to RG p (t>) for u < min ^ v > , Therefore for u > ESS£EiEJ ) the lower hull is -ff^j (u) = 

For p < 1, the function is concave for u G [ ml " , ma * — ]. The lower hull is therefore a linear function for 

u(rg p (v) — (max('u) — t) p ). The ^-optimal estimator is therefore constant for u < min{l, E£^i£l} ; rJv ' (u) = 

RG(t>) ma ^, , when max(o) < r, and RG p (u) = RG p (v) — (max(w) — r) p when max(i)) > r. 

For p > 1, RG P (m) is convex for u G [ """W , max v"-> ] < Geometrically, the lower hull follows the lower bound 
function for u > a, where a is the point where the slope of the lower bound function is equal to the slope of a line 
segment connecting the current point to the point (0, RG p (i>)). For u < a, the lower hull follows this line segment and 
is linear. Formally, the point a is the solution of 

RGp('u) = (max(u) — xr) p ^ 1 (pT + max(o) — xt) . 
If there is no solution a G [ n '"' ,min{l, max W }] ; weusea = min{l, max \ v > }, The estimates for u G [a,min{l, max w }] 

W„( v )f \ d RG'" ) («) , , . . 1 „-x(«)/ \ RG p («)-(max(«)-QT) p 

are RG„ (u) = ^ = pr(max(v) — ut) p 1 and for u < a, RG p ' (u) = — py ' a — ■ 

Figure |3] (top) illustrates RG P and the corresponding lower hull H^l for example vectors with p G {0.5, 1, 2}. 

~ (v) 

From RG p , we can compute for any vector v, the minimum possible variance attainable for it by an unbiased 
nonnegative estimator: 

VAR[rG p u) ]=/ RG^V^dM-RGp^) 2 . (8) 

JO 

We use the ^-optimal estimates as a reference point to measure the "variance competitiveness" of estimators. 
The L estimator. 

Theorem 5.2. [J3J The estimator RG^ (p > 0), specified as the solution of 

r i dRG^(u) 

VuVp RG p {L) (v,p) = - / & du (9) 

J P u 

has the following properties: 

• It is nonnegative and unbiased. 

• It is the unique (up to equivalence) variance^ -optimal monotone estimator. 

• It is -< + -optimal with respect to the partial order -< 

V -< Z ^=^ RG p («) < RG p (z) . 

• It has finite variances and is ^-competitive: 

\/v, var[rg p V] + rg p (v) 2 < 4(var[rg p ] + rg p (v) 2 ) . 
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Figure 3: Top: The lower bound function and corresponding lower hull for example vectors and p G {0.5, 1, 2}. 
Bottom: the corresponding optimal, L, and U estimates on outcomes consistent with the vector. 

^ + -optimality with respect to this particular order means that any estimator with a strictly lower variance for a 
data vector must have strictly higher variance on some vector with a smaller range - this means that the L estimator 
"prioritizes" data where the range (or difference when aggregated) is small. "Competitiveness," is a strong property 
that means that for all data vectors, the variance under the L estimator is not too far off the minimum possible variance 
for that vector by a nonnegative unbiased estimator. 

We solve the equations to derive the L estimator under PPS sampling. For an outcome S(u, v), we define v m [ n = 
min('u) if \S\ = r and v m - m = ut otherwise. 



RG 



(i) 







min{l, 



\S\ > 1 (maxW - v min y max{l, ^-} - f™£^£ } 



-} (max(v) — xr) p 



(10) 



dx 



Estimators and variance for RG and RG2 are provided in Tables [6] and [7] We obtain the following tight ratios on 
competitiveness: 



var[rg \v] + RG(«) 2 
var[rg ] + RG(l>) 2 
var[rg 2 (L V] + RG 2 (v) 2 



< 2 



< 2.5 



VAR[RG 2 '~ J + RG 2 (t>) 2 

The U estimator. 

Theorem 5.3. [13] The estimator RG^ (p > 0), specified as the solution of 

m.p(v,z)-JpRG P (u,v)di 



(11) 
(12) 



VS 1 = S(p,v), RG p (ft,v) = sup inf 



P-T) 



has the following properties: 
• It is nonnegative and unbiased. 
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• It is -^-optimal with respect to the partial order -< 

V -< Z ^=^> RG p (-u) > RG p (z) . 

• It has finite variances for all data vectors. 

The U estimator "prioritizes' ' data where the range (or difference when aggregated) is large. In particular, it is the 
nonnegative unbiased estimator with minimum variance on data with min(u) = 0. 

The solution for PPS with all-entries equal r is provided as Algorithm[T](see AppendixlAlfor calculation). 
The estimator rg;, and its variance for p = 1, 2 are provided in Tables[8]and|9](See AppendixlBlfor details). 
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vmin/vmax 
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(C) 



Figure 4: Variance (normalized by square of expectation) of RG* estimator over independent samples and of RG^ 



and RGp over shared-seed samples. Sampling with all-entries equal r. (A): data with max(i>) = 0.25r. (B): 

data with max(w) = O.Olr. (C): ratio VAr[rG;, ]/va 
Sweeping min(v). Top shows p = 1, bottom is p = 2. 



data with max(i)) = O.Olr. (C): ratio VAR[RGp ]/VAR[RGp ] for shared-seed sampling, selected ratios max(t>)/r. 



Choosing an estimator. How to choose between the L and U estimators? Figure|3]shows the ■u-optimal estimates and 
the L and U estimators for example vectors, illustrating their form and monotonicity of L and v-optimal. 

The estimators and their variance depends only on r and the maximum and minimum entry values max(i)) and 



min(t>). For all-entries-equal t and max(v) < r, we study the variance dependence on the ratio ™^S . 

Q The estimator RG Viao 1r\\\/^>r vot-innr-^ n;non i L 



is when rg(i>) 



The variance 
of 



VAR[RGp 



] = VAR[RGp \v] foiX 



has lower variance when v4 is sufficiently small. The solution 

min(u) 



1 l> 



<(v) 



, is such that 



VARfRG^V] < VAR[RGp L) 



rmn(v) 

max(w) 



< 



Forp = 1, (j>i w 0.285 (is the solution of the equality (1 - x)/{2x) = ln(l/x)). Forp = 2, <j) 2 ra 0.258. 

This suggests selecting an estimator according to expected characteristics of the data. If we expect rg(v) > 
(1 — 4> p ) max (v), we choose RG* and otherwise choose RG* . 

The variance of the L estimator over independent samples and of the L and U estimators over shared-seed samples 
is illustrated in Figure |4] The figure also illustrates the relation between the variance of the shared-seed L and U 
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dataset 


#keys 


pl% 


p2% 


T,i,h V i( h ) 


pl% 


p2% 


WE* 


h Vi(h) 


WE, 


th Vi(h) 


WE, 


<h Vi{h) 


destIP 


3.8 x 10 4 


65% 


65% 


1.1 x 10 B 


49% 


51% 


0.36 




0.19 




0.18 




Server 


2.7 xlO 5 


53% 


56% 


2.9 x 10 6 


50% 


50% 


0.75 




0.38 




0.37 




Surnames 


1.9 xlO 4 


100% 


100% 


8.9 x 10 7 


48.6% 


51.4% 


0.094 




0.0617 




0.0327 




OSPD8 


7.5 xlO 5 


99% 


99% 


1.57xl0 10 


46.8% 


53.2% 


0.0826 




0.0727 




0.0099 





Table 5: Datasets. Table shows total number of distinct keys with positive value in at least one of the two instances, 
corresponding percentage in each instance, total sum of values E, h v i{h) — J2hgk Eigtel v i(^)< and fraction (shown 



as percentage) in instance i — 1 , 2 






and normalized L\, L\+ and Li_ differences. 



estimators. When max(v)/r <C 1 (which we expect to be a prevailing scenario), VAR[rg ] is nearly at most 2 times 
VAR[rg ] butas min(t>) -> max(w), the ratio VAR[rg ]/var[rg ] is not bounded. When max(u)/r is close to 



,W 



,(Uh 



1, the variance of the U estimator is close to 0, and VAR[rg^]/var[rg ] is not bounded. Interestingly, forp = 2, 
the variance of the U estimator is always at least ^rg±{v), and thus, using ( fT2l . the variance of the L estimator is at 
most 4.4 times the variance of the D-optimal (and thus of the U estimator). 



6 Experimental Evaluation 

We study the estimation quality of our LP estimators, recalling that our LP estimate is the sum of RG P estimates: 

LP = J2heK RG p(M- This estimator is unbiased and has expectation LP = J2hGK RG p( v W)- The variance is 
additive and is Y^heK var[rg p (/i)]. The squared coefficient of variation is the ratio of the variance and the square of 

the exnectation- CV 2 (f P\ - E "^ VAR . [R&pl " (h)] 
tne expectation. LV {L^) - C£ heK kG p (-v(h))y> ■ 

Figure|5]shows the squared coefficient of variation CV 2 of our LP estimators as a function of the sampled fraction 

of the dataset. Each of the two instances was subjected to Poisson PPS ll22l sampling (Results are essentially identical 

for Priority sampling ll2~7l[T8l ). We study accuracy when applying the single-key estimator RG P for independent 



(L) 



(U). 



samples of instances and the estimators RG p and RG p for shared-seed (coordinated) samples of the two instances, 
forp= 1,2. 

We used 4 datasets with properties summarized in Table [5] 
destIP: keys: (anonymized) IP destination addresses, value: the number of IP flows to this destination IP (source: IP 
packet traces). Instances: two consecutive time periods. 

Server: Keys: (anonymized) source IP addresses, values: the number of HTTP requests issued to the server from this 
address. Instances: two consecutive time periods. Source: Web server log. 

Surnames: Keys: the 18.5 x 10 3 most common surnames in the US. value: the number of occurrences of the surname 
in English books digitized by Google and published within the time period ll26l . Instances: the years 2007 and 2008. 

OSPD8: Keys: 7.5 x 10 4 8 letter words that appear in the Official Scrabble Players Dictionary (OSPD). value: the 
number of occurrences of the term in English books digitized by Google and published within a time period l26l . 
Instances: the years 2007 and 2008. 

We can see qualitatively, that all estimators, even over independent samples, are satisfactory, in that the C V is small 

(L) 



for a sample that is a small fraction of the full data set. The monotone estimator RG p 

over independent samples. The 



over coordinated (shared- 

seed) samples outperforms, by orders of magnitude, the monotone estimator RG p 
gap widens for more aggressive sampling. 

The first two datasets (destIP and Server) exhibit significant difference between instances: the L\ distance is 



((') 



(D 



a large fraction of the total sum of values J2 heK Sief2l v i(h)- O n these datasets, RG P outperforms RG p on 
shared-seed samples. The last two datasets (Surnames and OSPD8) have small difference between instances and 
RG P outperforms RG p on shared seed samples. These trends are more pronounced for the higher moment p = 2. 
In this case, on Surnames and OSPD8 datasets, RG P over independent samples outperform RG P over shared- 
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seed samples. We can see that we can significantly improve accuracy by tailoring the selection of the estimator to 
properties of the data. The performance of the U estimator, however, can significantly diverge with similarity whereas 
the competitive L estimator is guaranteed not to be too far off. Therefore, when there is no prior knowledge on the 
difference, we suggest using the L estimator. 

The datasets also differ in the symmetry of change. The change is more symmetric in the first two data sets 
L p+ w L p _ whereas there is a general growth trend L p+ ^> L p _ in the last two datasets. We did not include perfor- 
mance figures for the asymmetric differences 2~2hgk RG p+( v (h)) an ^ 2~2hgk R G p -(v(h)), but trends are similar to 
the symmetric variants. 

7 Conclusion 

Difference queries are essential for monitoring, planning, and anomaly and change detection. Random sampling is 
an important tool for retaining the ability to query data under resource limitations. We provide the first satisfactory 
solution for estimating differences over sampled data sets. Our solution is comprehensive, covering common sampling 
schemes. It is supported by rigorous analysis and novel techniques that also allow us to gain deeper understanding and 
establish optimality. We demonstrated that our estimators perform well on diverse data sets. 
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Appendix 



RG* 



(13) 

Lin{max(u),r] 
min{ u m i n ,r} 



|5| > 1 max{max(w) — r, 0} — max{m.in('u) — r, 0} + r In min i max (.'")' r > 



Condition var[rCj tal 



min(u) > r 

max(t>) < t, min(u) = 2rg(v)t — RG(i>) 2 

max(u) < r, min(v) > 2rg(v)t - RG(t>) 2 - 2rmin(i>) ln( ^W ) 

< min('u) < r < max(«) (r) 2 — min(v) 2 — 2rmin(u) ln( ^^7^ ) 

= min('u), r < max(o) (r) 2 — min(v) 2 

Table 6: The estimator RG (top) and variance for data v (bottom) for shared-seed sampling 



r G W 



|5| = 

\S\ > 1 max{max(i)), t} 2 — max{min(i;),r} — 2niax{niin(i)), r}(max(u) — v m - ln ) 



-2rmax(t;)ln min{l , nax( " ) '[ } 



Condition 


VAR[RG2 f] 






rnin('u) > r 









max(v) < t 


— 4rmax(o) min(u) ln(-^^y-)(2max(v) — min(u)) - 
+ ^(5max(u) + 4min('u) — 9 max(w) min(t)) 2 ) 


- (max(u) - 


- min(v)) 4 



(15) 



(16) 



min(w) < r 4max(v) mm(«)r(min(w) — 2max(o)) In mi L v \ 

A +4max(t))mm(t))(r) z + ^- H g— ' — 

max(v) > t — 6max(u) min(i>) 2 T — 4max(«) min(u) 

— min(t?) + 4max(t>)min(u) + 4max(«) (t) 2 — 2 max(t>)(r) 3 

Table 7: The estimator RGj (top) and variance for data v (bottom) for shared-seed sampling 



A Derivation of RG^ 

If pr > max(i?) then RG (p, v) = and RG (p, v) = 0. Otherwise, when min(u) < pr < max(u), noting that the 
supremum is obtained by a vector v' with maximum entry max('u) and minimum entry 0, 

, v) (max(t;) - J7t) p - f mm{1 ' - } RG p {U) (u,v)du 

RG„ '(p,v) = mf p - (17) 

1 o<n<p p — 77 
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condition 



\S\ = 

i < \s\ < 

\S\=r 



Rb^ 



condition on v 





max{r, max(t))} 

max{max(v), r} — max{min(i)), r} 

,00 



Table 8: The estimator RG (left) and its variance VAr[rG | v] (right) for shared-seed sampling 



min(i>) > r 
max(v) < r 
min(v) < t < max(u) 

(u) 



VARfRG (l7, |vl 





RG(«)(t - RG(v)) 

min(i))(r — min(w)) 



1777 



condition on S(p, v) 



K(vf 



r 
^ min(u) 



RG 



(S) 



max(u) 

r 
max(u) 



k(w) 



isdil 



max(u) 

r 

max(v) 



k(v) 



max(u) 



2.P 
2,P 

1>P 

1,P 
1,P 

[1,2' 
[1,2 
[1,2 
[1,2 



min('u) max(ir) i 



^ min(v) 

— r 

^ max(-u) 

— r 

,p>2- 
,p< 2- 
,p< 2- 
,P< 



k(«) 



Kfo) 



>c(u) 



,p> 



(t>) 



M 



l(») 



min('o) ~ r) max(v) 



max(v) 2 

max(v) 2 — 2rmax(i>) + min(t;)T 

2r(max(u) — pr) 





4r(max(v) — r) 

2r(max(u) — pr) 



S ^RG 2 («) - 4r(max(u) - r)(- 



i(v) 



-1) 



condition on i> 


VAR[RCj2 u ] 


min(u) > r 
max(u) < t 

max(i>) ^_ p-, rj] min(i;) ^ ,-> max(?j) 



RG 3 (w)(|r-RG(w)) 

2(r - rain(,,)) (rG 2 (») 4r(max(w) r)] 


r L ' J' r — r 


max(u) ^_ p-, rj] min(i;) ^ ,-> max(?j) 


max(«)-T / , f ( n n rp.C^I 1 n™Wn r ./„\ 


r L , J' r r 


(RG 2 (D)+4(r) 2 -4max(«)r) 3 -RG 3 (u)(RG(D)-2r) 3 
+ 6(r) 2 


max(v) ^ r> 

r — 


(2max(t)) — min('u)) 2 min(u)(r — min('u)) 



Table 9: The estimator RGj (top) and its variance var[rg 2 I v ] (bottom) for shared-seed sampling 
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(U), 



Algorithm 1 RG p J (S) 



if \S\ = then return 

m <— maxi 6 5 Vi 

if |5| < rthenn-s-O 

else n <— min^s Vi 

ifn>T then return (m 
if p < 1 then 

if n=0 then return m 



n) 1 



{m,r} 



mini 77i, r 



else return - (m — 77,)^ — 



mini 77i, t} — n 
min{m,T} 



if ra < r then 

if pr > n then return pr(m — /rr)^" 1 
else return 



pr-m 

if?jo e (0,1) then 



(m-noT) r 



if p > max{77o,«/r} then return -^ 

if n/r < p < t]q then return pr(m — pr) p ~ l 
if p < n/r < ?7o then return 
if p < n/r > rjo then 



return 



T(m-n) p _ (T-n)(m-r)gT) p 
n(l-rio) 



else 



if pr > n then return m p 

else return j^(m — n) p — m p I ^ — 1 



> from hereafter |5| > 
> m = max('u) 

i> 7! = min(5) 

> case: min(v) > r 

> case: min(t>) < r andp < 1 



> case: max(o) < t, p > 1 

> case: n < t < max(«) and p > 1 
> subcase: 770 G (0, 1) 



> subcase: 770 & (0, 1) 



If pr < min(u), 



ROJ (/!»,«) 






T RG p ^ U \u,v)du 



.mln{l,^Sl} 



RG » - S^^z&Sly R6 P ( U > V ^ d 



(18) 



min{l, min(u)/r} 

Ifmin(v) > r, |5| = r, RGp = RGp = RG P («). If max(i)) < r, J 1 RG p («,«)du = RG (p, p) and the 
infimumis the derivative of the lower bound function, and thus, RG p (p,v) = pr(max(u) — pr) p ~ l . 



\S\ 



RG 



(tO 



(19) 



0,r : 

1 ... r — 1 : pr(max(tj) — /3r) p_1 

We now consider the case min('u) < r < max('u), solving ( fTTI i for p > min(t))/r. For p = 1 we obtain the 
equation 

, C) /--!..>!_ :„* (max(«) - 7tt) p 



RGi'(l,«)= inf 



1-7? 



,00/ 



When p = 1, the derivative is positive and the infimum is max(u). We obtain that RG (fi,v) — max(i)) for 

p > min(t))/r. Using ( fT8l . RG (p, i?) = max(o) — r when p < nim W , 
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For p 7^ 1, we need to find the value where h (77) = i max W ^ T ) j s minimized. The derivative is 

9/1(77) (max(o) — ?7t) p_1 / max(w) — r\r 

drj 1 — 77 \ 1 — 77 

The derivative is at 

pr — niaxfo) 

r(p - 1) 

If 770 is outside (0, 1), the infimum is obtained at 7/ = and the estimate is rg (p, v) = max(w) p for p > min(v)/T 
and, using (fTHT l. 

RG {U \p,v) = —L—rg( v ) - mnx(v) p ( T - 1) 
min(t>) mm(») 

for p < mm(v)/r. 

Otherwise, if 770 € (0, 1), the infimum is achieved at 770. Using (1171) . the estimate is 

- , . (max(u) - 77 t) p 

RGp^v) = ^ \-L- '- 

L — VO 

max(u) — r 

for p e [max{r|o, ™e1h2} 7 l] and RG p (p, v) = pr(max(t)) - pr)^ 1 for p e (2^, 770). Using dH, when p < 
"""W , then RG p (p, v) = when m '"W < ^ Q an( j 



RG P (/>,«) 



* Gp{v )- { ^- m )^_ m 



1 



min(i;) 



when — > 770. 



B Variance of RCr and RG;, 

The estimators RG and RG 2 , provided in Tables|8]and [9] are obtained by substituting p = 1 and p = 2 respectively 

in AlgorithmQ] We calculate the variance of these estimators. 

Variance of RG (C/) : When max(v) < r, we have RG (L ° = r for p G (HHEW , 2^1] and RG (C/) = otherwise. 

Hence, 

var[rg ( ] = (rg(v) 2 (1 - rg(v)/t) + (r - rg(v)) 2 rg(v)/t = rg(u)t - RG(f) 2 

Variance of RG^: When max(u) > t, we have ??o = 2 - s^. Thus 770 € (0, 1) ^=^ 22iM e (i ; 2). We use 

, , , 9 , (rg 2 (w) - 2-rmax(t0 + 2(r) 2 u) 3 
(RG 2 («) - 2rmax(-y + 2(r) 2 u fdu = V - 2l ' , 9 ; ^ ' ' . 

6(r) 2 

We start with the case max(u) < r. 

max(i,) 

r-.ru), 1 /, RG(w), . , , (RG 2 (D)-2rmax(j)) + 2(r)X 
VARRG^> = 1 !^)RG 4 (w) + - — -, ,1 y —^-L- 

T b{r) Z min[.) 

(1 RG(t,) )RG 4 (r) I RG(i(t,) RG 3(v)(RG(w)-2r) 3 



6(r) 2 6(r) 2 



4 
RG 3 («)( jjT - RG(w)) 
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The case max(u) > 2t: 

r - fulr i /, min(w) w , .2 / % n2 min(v) . , ,o „ , . . , . , .,2 

VARRG 2 ' \v] = (1 — )(max(v) - rg(v) 2 ) H — (max(u) -2rmax(«) + mm(v)r - RG 2 (i>)) 

r r 

1 , n / ,,2, , . , ,.2,, min(i>) . min(t)) . . . .>?.„ . . , . .. 5 

= (max(v) - RG« (maxi) + RG(v)) (1 — ) H — (t - mm» 2maxo - nun(w)) 

T T 

r-mm(v) . 2 . 2 min(w) . 2 . 2 

= min(w) (2max(») — min(w)) H (t — min(ul) (2max(i>) — mini)) 

r r 

= (2max(i)) — min(i>)) min(u)(r — min(v)) 
We next handle the case r < max(w) < 2r, m " > 770. 

VAR[RG 2 U V] = (1 - mm ^ )(4r(max(t>) - t) - RG 2 (i;)) 2 



T 

mm' 



— , , / - 4r(max(v) - r) , ( \ ' - RG 2 (w) 

RG 2 (i>) — 4r(max(v) — r) 



t \ min(t)) min(u) 

2(t — min(v)) 



T 
min(u) 



Lastly, for the case t < max(w) < It, min| -'"- ) < ^ . 
VAR[RG 2 y) |w] = (1 — r;o)(4r(max(w) — t) — RG 2 (w) ) + / (2(r(max(w) — ur) - RG 2 ) 2 du -\ — RG4(v) 



max(u) - t(. , , > \ , ss2 min(t)) . . 
' 4r(max(v) - r } - RG 2 (v)) H — RG 4 (v) 



+ 



T \ J T 

(RG 2 (v) + 4(r) 2 - 4max(u)r) 3 - rg 3 (w)(rg(w) - 2r) a 



C Variance of rg (l) and rg^ 

The estimators RG and RG 2 , provided in Tables [6] and [7] are obtained using (TlOt . We calculate their variance. 
Variance of RG (L) : When max(u) < r, we have rg (L) = rln(^M) w hen 1 < |5| < r - 1 and rg (L) = 
T ^ n ( mfnTF) ) wnen 1^1 = r - The variance is 

r - ID, , ,„ max(»). . ,2 f T / / s , ,max(o),.2 , min(w) , , . , ,max(«)., 

VARRG 1 > = (1 ^■)RG(uf+/ ROD -rln — ))<& + — (RG(w) - r In ^ 2 

t J min(v) ry r mm(u) 

. . ., ,max(o), „ , . , , .. 2 

= -2t mini) ln( ^f) + 2rg(i>)t - (rg (v 

min(v) 



When min('w) < r < max(v), RG = max(i)) — r + r ln(i) when 1 < |i5| < r — 1 and RG = max(w) — 
rln(— m) when |5| = r. The variance is VAR[rg |d] = (t) 2 — min(w) 2 — 2rmin(i)) ln( — A- . ). 

iance of RG 2 : 

If min('u) < r < max(v), RG2 = max(«) 2 — (r) 2 + 2t(ut — max(o) + max(u) In - when |5| £ [r — 1] and 
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RGj = max('u) 2 — (r) 2 + 2r(min(i;) — max('u) + max(u) In L ■> when |5| = r. The variance is 



VARfRGj \v] = 4max(i;)min(w)T(min(w) — 2max(i>))ln 



min(u) 



(t) 
+4max(«)min(»)(r) H -^ 6max(») min(u) r — min(v) 

— 4max(i>) min(u) + 4max(u)min(t;) + 4max(v) (r) 
, , . .3 8min(w) r 

If max(w) < t, Rc4 L) = 2r(ur - max(v) + max(v) In m ^ ) ) when |5| C [r - 1] and RG^ L) = 2r(min(u) 
max(w) + max(w) In ™^W ) when \S\ = r. The variance is 

VAR[rg1 l> |i;1 = — 4rmax(i;)min(w)ln( r-r)(2max(») — min(w)) + 

L -a 1 J ww ^ mm ( l) ) 'V v / w; 

+ — (5max(w) — 9max(v)min(v) + 4min(i>) ) — RG4(u) 
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